Run t-SNE on CD4+ events which express one or more cytokines. Repeat for Non-CD4 events.
The question being asked is “What are the memory and activation profiles of Ag-specific T cells?”

Stratify by Cohort and Antigen (S1, S2, NCAP, VEMP)
Color by
- Degree of functionality
- CD45RA
- CCR7
- HLA-DR
- CD38
Boxplots?

Also:
- Given the elevated IL2 in DMSO Hosp, it might actually also be interesting to take a look at the DMSO signal.
- Take a look at the difference in non-bg-corrected proportions and bg-corrected-proportions in order to get an idea of what amount of the t-SNE data might be background noise.
- Note that the term Not-CD4 is a bit ambiguous. The “4+” gate contains “CD4+CD8-” events, and the “NOT4+” gate contains “CD4-CD8+/-” events.

library(openCyto)
library(CytoML) # 1.12.0
library(flowCore) # required for description()
library(flowWorkspace) # required for gh_pop_get_data()
library(here)
library(tidyverse)
library(Rtsne)
library(ggplot2)
library(scales)
library(patchwork)
library(hues)
library(RColorBrewer)
library(ggrepel)
source(here::here("scripts/20200604_Helper_Functions.R")) # for distributeEvents() and sampleGatingHierarchy()
date <- 20200617
save_output <- FALSE
rerun_tsne <- FALSE

gs <- load_gs(here::here("out/GatingSets/20200611_HAARVI_ICS_GatingSet_AllBatches"))
## loading R object...
## loading tree object...
## Done
dput(gh_get_pop_paths(gs))
## c("root", "/Time", "/Time/LD-3+", "/Time/LD-3+/1419-3+", "/Time/LD-3+/1419-3+/S", 
## "/Time/LD-3+/1419-3+/S/Lymph", "/Time/LD-3+/1419-3+/S/Lymph/4+", 
## "/Time/LD-3+/1419-3+/S/Lymph/4+/107a", "/Time/LD-3+/1419-3+/S/Lymph/4+/154", 
## "/Time/LD-3+/1419-3+/S/Lymph/4+/CCR7+", "/Time/LD-3+/1419-3+/S/Lymph/4+/CD45RA+", 
## "/Time/LD-3+/1419-3+/S/Lymph/4+/IFNG", "/Time/LD-3+/1419-3+/S/Lymph/4+/IL2", 
## "/Time/LD-3+/1419-3+/S/Lymph/4+/IL17", "/Time/LD-3+/1419-3+/S/Lymph/4+/IL4513", 
## "/Time/LD-3+/1419-3+/S/Lymph/4+/TNF", "/Time/LD-3+/1419-3+/S/Lymph/CD38+", 
## "/Time/LD-3+/1419-3+/S/Lymph/HLADR+", "/Time/LD-3+/1419-3+/S/Lymph/NOT4+", 
## "/Time/LD-3+/1419-3+/S/Lymph/NOT4+/107a", "/Time/LD-3+/1419-3+/S/Lymph/NOT4+/154", 
## "/Time/LD-3+/1419-3+/S/Lymph/NOT4+/CCR7+", "/Time/LD-3+/1419-3+/S/Lymph/NOT4+/CD45RA+", 
## "/Time/LD-3+/1419-3+/S/Lymph/NOT4+/IFNG", "/Time/LD-3+/1419-3+/S/Lymph/NOT4+/IL2", 
## "/Time/LD-3+/1419-3+/S/Lymph/NOT4+/IL17", "/Time/LD-3+/1419-3+/S/Lymph/NOT4+/IL4513", 
## "/Time/LD-3+/1419-3+/S/Lymph/NOT4+/TNF")

CD4

Perform CD4 analysis all the way through first.

Sample events for t-SNE and QC

cd4_gates_for_tsne <- c(
  "/Time/LD-3+/1419-3+/S/Lymph/4+/107a", "/Time/LD-3+/1419-3+/S/Lymph/4+/154", 
  "/Time/LD-3+/1419-3+/S/Lymph/4+/IFNG", "/Time/LD-3+/1419-3+/S/Lymph/4+/IL2", 
  "/Time/LD-3+/1419-3+/S/Lymph/4+/IL17", "/Time/LD-3+/1419-3+/S/Lymph/4+/IL4513", 
  "/Time/LD-3+/1419-3+/S/Lymph/4+/TNF",
  "/Time/LD-3+/1419-3+/S/Lymph/4+/CCR7+", "/Time/LD-3+/1419-3+/S/Lymph/4+/CD45RA+",
  "/Time/LD-3+/1419-3+/S/Lymph/CD38+", "/Time/LD-3+/1419-3+/S/Lymph/HLADR+")
cd4_cytokine_gates <- c("/Time/LD-3+/1419-3+/S/Lymph/4+/107a", "/Time/LD-3+/1419-3+/S/Lymph/4+/154", 
                        "/Time/LD-3+/1419-3+/S/Lymph/4+/IFNG", "/Time/LD-3+/1419-3+/S/Lymph/4+/IL2", 
                        "/Time/LD-3+/1419-3+/S/Lymph/4+/IL17", "/Time/LD-3+/1419-3+/S/Lymph/4+/IL4513", 
                        "/Time/LD-3+/1419-3+/S/Lymph/4+/TNF")

Still need to add boolean gates.

gs_pop_add(gs, eval(substitute(flowWorkspace::booleanFilter(v),
                                       list(v = as.symbol(paste(cd4_cytokine_gates, collapse="|"))))),
           parent = "/Time/LD-3+/1419-3+/S/Lymph/4+", name = "CD4_CytokinePos")
## [1] 29
recompute(gs, "/Time/LD-3+/1419-3+/S/Lymph/4+")
## ..................................................................................................................................................................................................................................................................................................................................................................................................................done!
cd4_cytokine_pos_parentGate <- "/Time/LD-3+/1419-3+/S/Lymph/4+/CD4_CytokinePos"

pop_counts <- pData(gs) %>% 
  left_join(gs_pop_get_count_fast(gs, subpopulations = cd4_cytokine_pos_parentGate),
            by = c("rowname" = "name")) %>% 
  dplyr::rename(CD4_CytokinePos = Count) %>% 
  dplyr::select(rowname, Batch, "SAMPLE ID", STIM, Cohort, CD4_CytokinePos)

37C, 116C (DMSO), and 07B Spike 2 were filtered out for COMPASS due to low cell counts.
37C also has ~2-5% events in Live CD3+ gate. 07B Spike 2 has low-ish 13% Live CD3+ gate membership.
Include them all here, but note that for some of them, the low event counts means they will be under-represented in t-SNE.

head(pop_counts)
##             rowname Batch SAMPLE ID    STIM       Cohort CD4_CytokinePos
## 1 112405.fcs_332150     1     15545    DMSO Hospitalized             694
## 2 112407.fcs_341600     1     15545     SEB Hospitalized           38139
## 3 112409.fcs_398100     1     15545    VEMP Hospitalized             954
## 4 112411.fcs_348925     1     15545 Spike 1 Hospitalized            1022
## 5 112413.fcs_296775     1     15545 Spike 2 Hospitalized            1004
## 6 112415.fcs_378400     1     15545    NCAP Hospitalized            1359
sub_group_size_calculations <- pop_counts %>% 
  dplyr::filter(Cohort != "Healthy control" & !is.na(Cohort) & !(STIM %in% c("DMSO", "SEB"))) %>% # TRIMAS are NA
  group_by(Batch, Cohort, STIM) %>% 
  dplyr::summarise(n_Patients = n(),
                   min_CD4_CytokinePos = min(CD4_CytokinePos),
                   total_CD4_CytokinePos = sum(CD4_CytokinePos),
                   n_x_min_CD4_CytokinePos = n_Patients * min_CD4_CytokinePos)
sub_group_size_calculations %>% 
  arrange(n_x_min_CD4_CytokinePos) %>% 
  head() %>% 
  knitr::kable()
Batch Cohort STIM n_Patients min_CD4_CytokinePos total_CD4_CytokinePos n_x_min_CD4_CytokinePos
2 Non-hospitalized Spike 1 13 13 4317 169
2 Non-hospitalized Spike 2 13 15 1763 195
2 Non-hospitalized NCAP 13 30 3860 390
2 Non-hospitalized VEMP 13 36 11120 468
2 Hospitalized Spike 2 7 120 1495 840
1 Hospitalized Spike 2 6 171 2968 1026

Above: Based on the n_x_min_CD4_CytokinePos column, if we maintained equal representation of the patients within each group (by batch, cohort, and stim) there would only be 169 events per group, leading to a total of only 169x3x4 = 2028 events.
The limiting group is: B2, Non-hospitalized, Spike 1

sub_group_size_calculations %>% 
  arrange(total_CD4_CytokinePos) %>% 
  head() %>% 
  knitr::kable()
Batch Cohort STIM n_Patients min_CD4_CytokinePos total_CD4_CytokinePos n_x_min_CD4_CytokinePos
2 Hospitalized Spike 2 7 120 1495 840
2 Non-hospitalized Spike 2 13 15 1763 195
1 Hospitalized Spike 2 6 171 2968 1026
2 Hospitalized NCAP 7 312 3431 2184
3 Hospitalized Spike 2 7 239 3546 1673
1 Non-hospitalized Spike 2 13 130 3656 1690

Above: If we allow unequal patient representation, we can have up to 1495 events per group.
Let’s settle for 1000x*3x4 = 12000 events total in the t-SNE run. So, 1000 events per group.

Reproducibility is important

set.seed(date)
cd4_cytokinepos_sampleSizes <- pop_counts %>%
  dplyr::filter(Cohort != "Healthy control" & !is.na(Cohort) & !(STIM %in% c("DMSO", "SEB"))) %>% 
  group_by(Batch, Cohort, STIM) %>% 
  nest() %>% 
  ungroup() %>% 
  mutate(nsamp = map2(1000, data, function(totalEvents, df) {
    distributeEvents(totalEvents, df$CD4_CytokinePos)
  })) %>% 
  unnest(cols = c(data, nsamp))

Make sure all groups are the same size

cd4_cytokinepos_sampleSizes %>% 
  group_by(Batch, Cohort, STIM) %>% 
  summarise(nsamp_sum = sum(nsamp)) %>% 
  dplyr::pull(nsamp_sum) %>% 
  unique()
## [1] 1000

But keep in mind that there is lopsided patient representation even within groups:

cd4_cytokinepos_sampleSizes %>% 
  arrange(nsamp) %>% 
  head(n=10)
## # A tibble: 10 x 7
##    Batch STIM    Cohort         rowname        `SAMPLE ID` CD4_CytokinePos nsamp
##    <dbl> <chr>   <chr>          <chr>          <chr>                 <int> <dbl>
##  1     2 Spike 1 Non-hospitali… 113851.fcs_27… 37C                      13    13
##  2     2 Spike 2 Non-hospitali… 113853.fcs_23… 37C                      15    15
##  3     2 NCAP    Non-hospitali… 113855.fcs_20… 37C                      30    30
##  4     2 VEMP    Non-hospitali… 113849.fcs_25… 37C                      36    36
##  5     2 Spike 2 Non-hospitali… 113685.fcs_13… 80C                      55    55
##  6     2 Spike 2 Non-hospitali… 113781.fcs_12… 76C                      61    61
##  7     1 VEMP    Non-hospitali… 112584.fcs_41… 90C                     751    76
##  8     1 Spike 1 Non-hospitali… 112831.fcs_23… 25C                     371    76
##  9     1 Spike 2 Non-hospitali… 112563.fcs_26… 148C                    258    76
## 10     1 NCAP    Non-hospitali… 112515.fcs_38… 21C                     639    76
cd4_cytokinepos_sampleSizes_4plot <- cd4_cytokinepos_sampleSizes %>%
  mutate(the_label = ifelse(nsamp < 50,
                            paste0("Batch ", Batch, ", ", `SAMPLE ID`, ", ", STIM), NA)) %>% 
  mutate(Cohort = factor(Cohort,
                         levels = c("Non-hospitalized", "Hospitalized"),
                         labels = c("Conv\nNon-Hosp", "Conv\nHosp")))
ggplot(cd4_cytokinepos_sampleSizes_4plot,
       aes(factor(Cohort), nsamp, label = the_label)) +
  geom_boxplot(outlier.shape = NA) +
  geom_jitter(width = 0.15, height = 0) +
  theme_bw(base_size=20) +
  labs(title="ICS CD4 t-SNE patient representation",
       y="Events Sampled for t-SNE") +
  geom_text() +
  facet_grid(Batch ~ STIM) +
  theme(axis.title.x = element_blank())
## Warning: Removed 232 rows containing missing values (geom_text).

call_sampleGatingHierarchy_for_cd4 <- function(currentSampleName, currentSampleSize) {
  # print(sprintf("Sampling data from %s", currentSampleName))
  sampleGatingHierarchy(gs[[currentSampleName]], cd4_cytokine_pos_parentGate, currentSampleSize, cd4_gates_for_tsne)
}
cd4_cytokinepos_data4tsne <- map2_dfr(cd4_cytokinepos_sampleSizes$rowname, cd4_cytokinepos_sampleSizes$nsamp,
                                     call_sampleGatingHierarchy_for_cd4)
knitr::kable(head(cd4_cytokinepos_data4tsne))
name EXPERIMENT NAME $DATE SAMPLE ID PATIENT ID STIM WELL ID PLATE NAME filename rowname Sample ID Cohort Age Sex Race Hispanic? Days symptom onset to visit 1 Pair ID Batch /Time/LD-3+/1419-3+/S/Lymph/4+/CD4_CytokinePos /Time/LD-3+/1419-3+/S/Lymph/4+/107a /Time/LD-3+/1419-3+/S/Lymph/4+/154 /Time/LD-3+/1419-3+/S/Lymph/4+/IFNG /Time/LD-3+/1419-3+/S/Lymph/4+/IL2 /Time/LD-3+/1419-3+/S/Lymph/4+/IL17 /Time/LD-3+/1419-3+/S/Lymph/4+/IL4513 /Time/LD-3+/1419-3+/S/Lymph/4+/TNF /Time/LD-3+/1419-3+/S/Lymph/4+/CCR7+ /Time/LD-3+/1419-3+/S/Lymph/4+/CD45RA+ /Time/LD-3+/1419-3+/S/Lymph/CD38+ /Time/LD-3+/1419-3+/S/Lymph/HLADR+ Time FSC-A FSC-H SSC-A SSC-H CD8b BB700 TNFa FITC CD107a PE-Cy7 CD154 PE-Cy5 CD3 ECD IL2 PE CD4 APC-H7 IL17a Ax700 IL4/5/13 APC CD14/CD19 BV785 CCR7 BV711 CD38 BV605 L/D IFNg V450 CD45RA BUV737 HLADR BUV395
112409.fcs 20200528_COVID_ICS-B1 28-MAY-2020 15545 15545 VEMP A06 P2 15545_A6_A06_006.fcs 112409.fcs_398100 NA Hospitalized 63 N/A N/A N/A 47 7 1 1 0 0 0 0 1 0 0 1 0 0 0 21.70000 118285.64 101663 17393.91 16827 934.1588 408.87811 364.8139 1171.5387 2976.845 838.0140 1430.116 1964.217 -57.52203 1153.8206 2771.098 1159.6815 1131.569 854.4024 1796.7207 412.0552
112409.fcs 20200528_COVID_ICS-B1 28-MAY-2020 15545 15545 VEMP A06 P2 15545_A6_A06_006.fcs 112409.fcs_398100 NA Hospitalized 63 N/A N/A N/A 47 7 1 1 1 0 0 0 0 0 0 1 0 0 0 32.80400 140124.23 122385 18028.14 17253 613.5341 13.17943 1851.6812 642.4897 2927.073 723.6606 1414.309 1318.955 609.11902 848.1237 1949.186 286.3737 1627.756 588.3616 816.4142 441.0412
112409.fcs 20200528_COVID_ICS-B1 28-MAY-2020 15545 15545 VEMP A06 P2 15545_A6_A06_006.fcs 112409.fcs_398100 NA Hospitalized 63 N/A N/A N/A 47 7 1 1 1 0 0 0 0 0 0 1 1 1 0 33.86900 90099.52 75064 16750.98 15912 395.2364 593.60278 1733.3856 1106.0293 3065.431 334.5751 1569.841 1247.176 197.42441 655.2477 2033.118 2015.7095 1568.413 778.6832 2952.9709 823.9152
112409.fcs 20200528_COVID_ICS-B1 28-MAY-2020 15545 15545 VEMP A06 P2 15545_A6_A06_006.fcs 112409.fcs_398100 NA Hospitalized 63 N/A N/A N/A 47 7 1 1 1 0 0 0 0 0 0 1 0 0 0 75.61400 75743.88 61429 16034.97 15509 197.2342 756.94232 2077.8496 1138.6351 2937.409 448.6306 1691.632 1319.934 431.49289 955.3243 2010.674 717.8362 1688.434 542.4303 1675.3750 739.0743
112409.fcs 20200528_COVID_ICS-B1 28-MAY-2020 15545 15545 VEMP A06 P2 15545_A6_A06_006.fcs 112409.fcs_398100 NA Hospitalized 63 N/A N/A N/A 47 7 1 1 0 0 0 0 1 0 0 1 0 0 0 80.91700 111324.80 94592 18714.57 18480 923.1359 871.29596 764.8347 770.8456 2747.528 864.4508 1299.245 2121.104 442.95624 1412.8540 2912.742 566.1185 1157.038 269.7643 1595.1530 703.0101
112409.fcs 20200528_COVID_ICS-B1 28-MAY-2020 15545 15545 VEMP A06 P2 15545_A6_A06_006.fcs 112409.fcs_398100 NA Hospitalized 63 N/A N/A N/A 47 7 1 1 1 0 0 0 0 0 0 1 0 0 0 73.48299 118125.28 104746 17660.13 17114 592.4996 352.10556 1711.0443 891.3246 2987.990 255.8075 1429.871 834.546 411.76816 1051.9069 2393.764 1132.9076 1567.465 666.0800 1376.0842 928.6504
dim(cd4_cytokinepos_data4tsne)
## [1] 24000    52

Is there maybe one cytokine that is dominating the entire sample?
If one cytokine has very high background expression (and a generous gate), it could be gated positive in a lot of events.
The high number of events expressing this cytokine could lead to it dominating the data, so that most sampled events are positive for this noisy cytokine. It would drown out real signal from other cytokines.

cytokine_dominance <- cd4_cytokinepos_data4tsne %>% 
  group_by(Batch) %>% 
  summarise_at(cd4_cytokine_gates, sum) %>%
  t() %>%
  as.data.frame() %>%
  set_names(c("B1", "B2", "B3")) %>%
  rownames_to_column("Cytokine_Gate") %>%
  dplyr::filter(Cytokine_Gate != "Batch")
knitr::kable(cytokine_dominance)
Cytokine_Gate B1 B2 B3
/Time/LD-3+/1419-3+/S/Lymph/4+/107a 1642 2105 1658
/Time/LD-3+/1419-3+/S/Lymph/4+/154 2228 3583 2371
/Time/LD-3+/1419-3+/S/Lymph/4+/IFNG 1329 1378 960
/Time/LD-3+/1419-3+/S/Lymph/4+/IL2 2085 2482 2004
/Time/LD-3+/1419-3+/S/Lymph/4+/IL17 2307 273 1804
/Time/LD-3+/1419-3+/S/Lymph/4+/IL4513 799 1694 1397
/Time/LD-3+/1419-3+/S/Lymph/4+/TNF 2066 2979 2366
cytokine_dominance %>%
  pivot_longer(cols = starts_with("B"), names_to = "Batch", values_to = "Events_in_Gate") %>% 
  mutate(Cytokine = sub(".*4\\+\\/(.*)", "\\1", Cytokine_Gate)) %>% 
  ggplot(aes(Cytokine, Events_in_Gate, fill = Cytokine)) +
  theme_bw(base_size=18) +
  geom_bar(stat="identity") +
  facet_grid(. ~ Batch) +
  labs(title = "CD4 t-SNE Run Cytokine Dominance by Batch")

The cytokine imbalance doesn’t look too bad.

Run t-SNE

cols_4_tsne <- c("CD3 ECD", "CD8b BB700", "CD4 APC-H7",
                  "TNFa FITC", "CD107a PE-Cy7", 
                  "CD154 PE-Cy5", "IL2 PE", "IL17a Ax700", 
                  "IL4/5/13 APC", "IFNg V450", 
                  "CCR7 BV711", "CD45RA BUV737",
                  "CD38 BV605", "HLADR BUV395")
cd4.scaled_tsne_input <- cd4_cytokinepos_data4tsne %>% 
  dplyr::select(all_of(cols_4_tsne)) %>% 
  as.matrix() %>% 
  scale() %>% 
  as.data.frame() %>% 
  rename_at(vars(all_of(cols_4_tsne)),function(x) paste0(x,".scaled"))

cd4_cytokinepos_data4tsne <- cbind(cd4_cytokinepos_data4tsne, cd4.scaled_tsne_input)

# t-SNE can take a long time, so there is a rerun_tsne switch
if(rerun_tsne) {
  print("Running t-SNE")
  print(Sys.time())
  cd4_cytokinepos_tsne_out <- cd4_cytokinepos_data4tsne %>% 
    # Run CD3, co-receptor, cytokine, memory, and activation markers through t-SNE
    dplyr::select(all_of(paste0(cols_4_tsne, ".scaled"))) %>% 
    Rtsne::Rtsne(check_duplicates = FALSE)
  print(Sys.time())
  cd4_cytokinepos_dat <- cbind(as.data.frame(cd4_cytokinepos_tsne_out$Y) %>% 
    dplyr::rename(x = V1, y = V2),
    cd4_cytokinepos_data4tsne)
  if(save_output) {
    saveRDS(cd4_cytokinepos_dat, here::here(sprintf("out/tSNE/%s_ICS_CD4_CytokinePos_tSNE.rds", date)))
  }
} else {
  # Assuming t-SNE results are already saved
  print("Loading saved t-SNE run")
  cd4_cytokinepos_dat <- readRDS(here::here(sprintf("out/tSNE/%s_ICS_CD4_CytokinePos_tSNE.rds", date)))
}
## [1] "Loading saved t-SNE run"

Plot t-SNE results

# Arial font setup. Downloaded afms from https://github.com/microsoft/microsoft-r-open/tree/ec3fd89e5fb5794bd8149905c134ad801bb61800
Arial <- Type1Font(family = "Arial",
                   metrics = c(here::here("data/Arial_afm/ArialMT.afm"),
                               here::here("data/Arial_afm/ArialMT-Bold.afm"), 
                               here::here("data/Arial_afm/ArialMT-Italic.afm"),
                               here::here("data/Arial_afm/ArialMT-BoldItalic.afm")))
pdfFonts(Arial = Arial)

boolColorScheme <- c("FALSE" = "#E2E2E2", "TRUE" = "#023FA5")

cd4_cytokinepos_dat <- cd4_cytokinepos_dat %>% 
  mutate(cytokine_degree = rowSums(dplyr::select(., all_of(cd4_cytokine_gates))))
cd4_cytokinepos_dat <- cd4_cytokinepos_dat %>% 
  mutate(Cohort = factor(Cohort, levels = c("Non-hospitalized", "Hospitalized")),
         STIM = factor(STIM, levels = c("Spike 1", "Spike 2", "NCAP", "VEMP")))

stim_labs <- c("S1", "S2", "NCAP", "VEMP") 
names(stim_labs) <- c("Spike 1", "Spike 2", "NCAP", "VEMP")
cohort_labs <- c("Conv\nNon-Hosp", "Conv\nHosp")
names(cohort_labs) <- c("Non-hospitalized", "Hospitalized")

base_tsne_plot <- function(currentColumn, pointSize = 0.2, colorScheme = NA) {
  p <- ggplot(cd4_cytokinepos_dat, aes(x=x, y=y,
                                       colour=if(currentColumn %in% c("Batch", "SAMPLE ID")) {
                                         factor(!!as.name(currentColumn))
                                         } else {
                                           as.logical(!!as.name(currentColumn))
                                           })) +
    geom_point(shape=20, alpha=0.8, size=pointSize) +
    facet_grid(Cohort ~ STIM, switch="y",
               labeller = labeller(Cohort = cohort_labs, STIM = stim_labs)) +
    theme_bw() +
    theme(plot.title=element_text(hjust = 0.5, size=22, face="bold"),
          axis.text=element_blank(),
          axis.line=element_blank(),
          axis.ticks=element_blank(),
          axis.title=element_blank(),
          strip.text=element_text(face="bold", size=22),
          panel.grid.major = element_blank(),
          legend.title=element_text(face="bold", size=14),
          strip.text.x = element_text(margin = margin(0.15,0,0.15,0, "cm")))
  if(!anyNA(colorScheme)) {
    p <- p + scale_color_manual(values = colorScheme)
  }
  if(currentColumn %in% c("Batch", "SAMPLE ID")) {
    p <- p + labs(color = currentColumn) +
      guides(colour = guide_legend(override.aes = list(size=10))) +
      theme(legend.title = element_text(size=15),
            legend.text = element_text(size=15),
            legend.position = "bottom") +
      scale_colour_manual(values=as.character(iwanthue(length(unique(cd4_cytokinepos_dat[,currentColumn])))))
  } else {
    p <- p + theme(legend.position = "none")
  }
  p
}

Look for Batch and Patient sub-clustering

base_tsne_plot("Batch")

base_tsne_plot("SAMPLE ID")

Cytokine, Memory, and Activation expression localization

Now visualize the cytokine, memory, and activation expression localization

for(cg in cd4_gates_for_tsne) {
  print(base_tsne_plot(cg, colorScheme = boolColorScheme) +
    labs(title = sprintf("CD4+Cytokine+ t-SNE\nColored by %s", sub(".*\\/([^(\\/)]+)", "\\1", cg))))
}

Color by degree

table(cd4_cytokinepos_dat$cytokine_degree)
## 
##     1     2     3     4     5     6 
## 16001  2676  3237  1985   100     1
myPalette <- colorRampPalette(rev(brewer.pal(11, "Spectral")))
sc <- scale_colour_gradientn(colours = myPalette(100), limits=c(1, 6))

currentColumn <- "cytokine_degree"
pointSize <- 0.2
ggplot(cd4_cytokinepos_dat, aes(x=x, y=y, colour=!!as.name(currentColumn))) +
  geom_point(shape=20, alpha=0.8, size=pointSize) +
  facet_grid(Cohort ~ STIM, switch="y",
             labeller = labeller(Cohort = cohort_labs, STIM = stim_labs)) +
  labs(colour="Cytokine\nDegree",
       title="CD4+Cytokine+ t-SNE\nColored by Cytokine Polyfunctionality") +
  theme_bw() +
  theme(plot.title=element_text(hjust = 0.5, size=22, face="bold"),
        axis.text=element_blank(),
        axis.line=element_blank(),
        axis.ticks=element_blank(),
        axis.title=element_blank(),
        strip.text=element_text(face="bold", size=22),
        panel.grid.major = element_blank(),
        legend.title=element_text(face="bold", size=15),
        legend.text = element_text(size=13),
        strip.text.x = element_text(margin = margin(0.15,0,0.15,0, "cm")),
        legend.position = "bottom") +
  sc

CD4 Observations

  • In terms of cytokine subpopulations which are more prevalent in a particular stim or hospitalization status, it is easier to identify them using COMPASS. But we can interpret the t-SNE results anyway, and we can see that the t-SNE results reflect our COMPASS findings:
    • VEMP looks different from all the rest of the stims, with a large CD107a subpopulation.
    • TNF, IL2, and CD154 are often co-expressed in the S1, S2, and NCAP panels, sometimes with IFNg. You can see that these events are enriched in the convalescent hospitalized patients (the dotplots from the COMPASS follow-up plots make this point more directly).
    • It is helpful to see that IL4/5/13 and IL17 each seem to form their own clusters (away from the TNF/IL2/CD154 clusters) and are seen across all the stims: however, a lot of these events are likely not above background (i.e. also present in the DMSO condition and potentially noise), as evident from the background-corrected dotplots and FACs plots.
  • Memory:
    • Almost all cytokine positive events are CCR7+ (so, TCM or Naive)
    • Most of the TNF, IL2, and CD154 events look CD45RA- (so, likely TCM). In contrast, the CD107a, IL4/5/13, and IL17 clusters seem to be a mix of CD45RA+ (so, probably Naive) and CD45RA-.
  • Activation:
    • There are many tiny localizations of CD38+ expression, but not a lot of signal overall. We should probably visualize these events in a FACs plot to see if it looks real.
    • HLADR: basically nothing.
  • Polyfunctionality: Most of the polyfunctional signal is in the TNF, IL2, and CD154 area. Makes sense. Also a little bit of dual IL17a+IL4/5/13+ as well.
  • Despite the trends being more quantitative in COMPASS, I think t-SNE allows us to see some of the big-picture results very clearly.

Follow-up:
We should plot the memory breakdown of the cytokine+ events in boxplots.
We could also focus in on either the 1) IL2+ or TNF+ or CD154+ events as a group or 2) the polyfunctional COMPASS subsets, and make boxplots about their memory status.
Depending on how informative we think the activation data are, we can do the same for CD38 and HLADR.

Not-CD4

Perform Not-CD4 analysis.

Sample events for t-SNE and QC

notcd4_gates_for_tsne <- c(
  "/Time/LD-3+/1419-3+/S/Lymph/NOT4+/107a", "/Time/LD-3+/1419-3+/S/Lymph/NOT4+/154", 
  "/Time/LD-3+/1419-3+/S/Lymph/NOT4+/IFNG", "/Time/LD-3+/1419-3+/S/Lymph/NOT4+/IL2", 
  "/Time/LD-3+/1419-3+/S/Lymph/NOT4+/IL17", "/Time/LD-3+/1419-3+/S/Lymph/NOT4+/IL4513", 
  "/Time/LD-3+/1419-3+/S/Lymph/NOT4+/TNF",
  "/Time/LD-3+/1419-3+/S/Lymph/NOT4+/CCR7+", "/Time/LD-3+/1419-3+/S/Lymph/NOT4+/CD45RA+",
  "/Time/LD-3+/1419-3+/S/Lymph/CD38+", "/Time/LD-3+/1419-3+/S/Lymph/HLADR+")
notcd4_cytokine_gates <- c("/Time/LD-3+/1419-3+/S/Lymph/NOT4+/107a", "/Time/LD-3+/1419-3+/S/Lymph/NOT4+/154", 
                        "/Time/LD-3+/1419-3+/S/Lymph/NOT4+/IFNG", "/Time/LD-3+/1419-3+/S/Lymph/NOT4+/IL2", 
                        "/Time/LD-3+/1419-3+/S/Lymph/NOT4+/IL17", "/Time/LD-3+/1419-3+/S/Lymph/NOT4+/IL4513", 
                        "/Time/LD-3+/1419-3+/S/Lymph/NOT4+/TNF")

Still need to add boolean gates.

gs_pop_add(gs, eval(substitute(flowWorkspace::booleanFilter(v),
                                       list(v = as.symbol(paste(notcd4_cytokine_gates, collapse="|"))))),
           parent = "/Time/LD-3+/1419-3+/S/Lymph/NOT4+", name = "NotCD4_CytokinePos")
## [1] 30
recompute(gs) # , "/Time/LD-3+/1419-3+/S/Lymph/NOT4+") # For some reason there is an error if I try to recompute lower in the gating tree
## ..................................................................................................................................................................................................................................................................................................................................................................................................................done!
notcd4_cytokine_pos_parentGate <- "/Time/LD-3+/1419-3+/S/Lymph/NOT4+/NotCD4_CytokinePos"

pop_counts <- pData(gs) %>% 
  left_join(gs_pop_get_count_fast(gs, subpopulations = notcd4_cytokine_pos_parentGate),
            by = c("rowname" = "name")) %>% 
  dplyr::rename(NotCD4_CytokinePos = Count) %>% 
  dplyr::select(rowname, Batch, "SAMPLE ID", STIM, Cohort, NotCD4_CytokinePos)

37C, 116C (DMSO), and 07B Spike 2 were filtered out for COMPASS due to low cell counts.
37C also has ~2-5% events in Live CD3+ gate. 07B Spike 2 has low-ish 13% Live CD3+ gate membership.
Include them all here, but note that for some of them, the low event counts means they will be under-represented in t-SNE.

head(pop_counts)
##             rowname Batch SAMPLE ID    STIM       Cohort NotCD4_CytokinePos
## 1 112405.fcs_332150     1     15545    DMSO Hospitalized                168
## 2 112407.fcs_341600     1     15545     SEB Hospitalized               7813
## 3 112409.fcs_398100     1     15545    VEMP Hospitalized                218
## 4 112411.fcs_348925     1     15545 Spike 1 Hospitalized                174
## 5 112413.fcs_296775     1     15545 Spike 2 Hospitalized                110
## 6 112415.fcs_378400     1     15545    NCAP Hospitalized                270
sub_group_size_calculations <- pop_counts %>% 
  dplyr::filter(Cohort != "Healthy control" & !is.na(Cohort) & !(STIM %in% c("DMSO", "SEB"))) %>% # TRIMAS are NA
  group_by(Batch, Cohort, STIM) %>% 
  dplyr::summarise(n_Patients = n(),
                   min_NotCD4_CytokinePos = min(NotCD4_CytokinePos),
                   total_NotCD4_CytokinePos = sum(NotCD4_CytokinePos),
                   n_x_min_NotCD4_CytokinePos = n_Patients * min_NotCD4_CytokinePos)
sub_group_size_calculations %>% 
  arrange(n_x_min_NotCD4_CytokinePos) %>% 
  head() %>% 
  knitr::kable()
Batch Cohort STIM n_Patients min_NotCD4_CytokinePos total_NotCD4_CytokinePos n_x_min_NotCD4_CytokinePos
2 Non-hospitalized Spike 2 13 29 1129 377
2 Hospitalized Spike 2 7 68 1649 476
3 Hospitalized Spike 2 7 76 1581 532
1 Hospitalized Spike 2 6 93 694 558
1 Non-hospitalized Spike 2 13 43 2558 559
2 Non-hospitalized NCAP 13 43 2011 559

Above: Based on the n_x_min_NotCD4_CytokinePos column, if we maintained equal representation of the patients within each group (by batch, cohort, and stim) there would be 377 events per group, leading to a total of only 377x3x4 = 4524 events.
The limiting group is: B2, Non-hospitalized, Spike 2

sub_group_size_calculations %>% 
  arrange(total_NotCD4_CytokinePos) %>% 
  head() %>% 
  knitr::kable()
Batch Cohort STIM n_Patients min_NotCD4_CytokinePos total_NotCD4_CytokinePos n_x_min_NotCD4_CytokinePos
1 Hospitalized Spike 2 6 93 694 558
1 Hospitalized Spike 1 6 166 1115 996
2 Non-hospitalized Spike 2 13 29 1129 377
1 Hospitalized NCAP 6 169 1373 1014
3 Hospitalized Spike 2 7 76 1581 532
2 Hospitalized Spike 2 7 68 1649 476

Above: If we allow unequal patient representation, we can have up to 694 events per group.
Let’s settle for 694x3x4 = 8328 events total in the t-SNE run.

Reproducibility is important

set.seed(date+1)
notcd4_cytokinepos_sampleSizes <- pop_counts %>%
  dplyr::filter(Cohort != "Healthy control" & !is.na(Cohort) & !(STIM %in% c("DMSO", "SEB"))) %>% 
  group_by(Batch, Cohort, STIM) %>% 
  nest() %>% 
  ungroup() %>% 
  mutate(nsamp = map2(694, data, function(totalEvents, df) {
    distributeEvents(totalEvents, df$NotCD4_CytokinePos)
  })) %>% 
  unnest(cols = c(data, nsamp))

Make sure all groups are the same size

notcd4_cytokinepos_sampleSizes %>% 
  group_by(Batch, Cohort, STIM) %>% 
  summarise(nsamp_sum = sum(nsamp)) %>% 
  dplyr::pull(nsamp_sum) %>% 
  unique()
## [1] 694

But keep in mind that there is lopsided patient representation even within groups:

notcd4_cytokinepos_sampleSizes %>% 
  arrange(nsamp) %>% 
  head(n=10)
## # A tibble: 10 x 7
##    Batch STIM    Cohort        rowname       `SAMPLE ID` NotCD4_CytokineP… nsamp
##    <dbl> <chr>   <chr>         <chr>         <chr>                   <int> <dbl>
##  1     2 Spike 2 Non-hospital… 113721.fcs_2… 161C                       29    29
##  2     2 Spike 2 Non-hospital… 113781.fcs_1… 76C                        37    37
##  3     2 Spike 2 Non-hospital… 113853.fcs_2… 37C                        41    41
##  4     1 Spike 2 Non-hospital… 112821.fcs_1… 113C                       43    43
##  5     2 NCAP    Non-hospital… 113723.fcs_2… 161C                       43    43
##  6     2 Spike 1 Non-hospital… 113671.fcs_1… 59C                        46    46
##  7     2 Spike 2 Non-hospital… 113709.fcs_1… 157C                       46    46
##  8     1 NCAP    Non-hospital… 112823.fcs_1… 113C                       51    51
##  9     1 VEMP    Non-hospital… 112459.fcs_3… 112C                      732    53
## 10     1 VEMP    Non-hospital… 112534.fcs_2… 56C                       227    53
notcd4_cytokinepos_sampleSizes_4plot <- notcd4_cytokinepos_sampleSizes %>%
  mutate(the_label = ifelse(nsamp < 25,
                            paste0("Batch ", Batch, ", ", `SAMPLE ID`, ", ", STIM), NA)) %>% 
  mutate(Cohort = factor(Cohort,
                         levels = c("Non-hospitalized", "Hospitalized"),
                         labels = c("Conv\nNon-Hosp", "Conv\nHosp")))
ggplot(notcd4_cytokinepos_sampleSizes_4plot,
       aes(factor(Cohort), nsamp, label = the_label)) +
  geom_boxplot(outlier.shape = NA) +
  geom_jitter(width = 0.15, height = 0) +
  theme_bw(base_size=20) +
  labs(title="ICS Not-CD4 t-SNE patient representation",
       y="Events Sampled for t-SNE") +
  geom_text_repel(size=4) +
  facet_grid(Batch ~ STIM) +
  theme(axis.title.x = element_blank())
## Warning: Removed 236 rows containing missing values (geom_text_repel).

call_sampleGatingHierarchy_for_cd4 <- function(currentSampleName, currentSampleSize) {
  # print(sprintf("Sampling data from %s", currentSampleName))
  sampleGatingHierarchy(gs[[currentSampleName]], notcd4_cytokine_pos_parentGate, currentSampleSize, notcd4_gates_for_tsne)
}
notcd4_cytokinepos_data4tsne <- map2_dfr(notcd4_cytokinepos_sampleSizes$rowname, notcd4_cytokinepos_sampleSizes$nsamp,
                                     call_sampleGatingHierarchy_for_cd4)
knitr::kable(head(notcd4_cytokinepos_data4tsne))
name EXPERIMENT NAME $DATE SAMPLE ID PATIENT ID STIM WELL ID PLATE NAME filename rowname Sample ID Cohort Age Sex Race Hispanic? Days symptom onset to visit 1 Pair ID Batch /Time/LD-3+/1419-3+/S/Lymph/NOT4+/NotCD4_CytokinePos /Time/LD-3+/1419-3+/S/Lymph/NOT4+/107a /Time/LD-3+/1419-3+/S/Lymph/NOT4+/154 /Time/LD-3+/1419-3+/S/Lymph/NOT4+/IFNG /Time/LD-3+/1419-3+/S/Lymph/NOT4+/IL2 /Time/LD-3+/1419-3+/S/Lymph/NOT4+/IL17 /Time/LD-3+/1419-3+/S/Lymph/NOT4+/IL4513 /Time/LD-3+/1419-3+/S/Lymph/NOT4+/TNF /Time/LD-3+/1419-3+/S/Lymph/NOT4+/CCR7+ /Time/LD-3+/1419-3+/S/Lymph/NOT4+/CD45RA+ /Time/LD-3+/1419-3+/S/Lymph/CD38+ /Time/LD-3+/1419-3+/S/Lymph/HLADR+ Time FSC-A FSC-H SSC-A SSC-H CD8b BB700 TNFa FITC CD107a PE-Cy7 CD154 PE-Cy5 CD3 ECD IL2 PE CD4 APC-H7 IL17a Ax700 IL4/5/13 APC CD14/CD19 BV785 CCR7 BV711 CD38 BV605 L/D IFNg V450 CD45RA BUV737 HLADR BUV395
112409.fcs 20200528_COVID_ICS-B1 28-MAY-2020 15545 15545 VEMP A06 P2 15545_A6_A06_006.fcs 112409.fcs_398100 NA Hospitalized 63 N/A N/A N/A 47 7 1 1 0 0 0 0 0 1 0 0 0 0 0 79.665 196105.84 175892 26247.90 25024 1083.7605 277.8082 771.0317 784.7178 2749.721 647.3281 789.1005 542.66034 1592.3297 935.7404 1228.624 1266.1050 1142.830 1298.8314 1688.259 646.1108
112409.fcs 20200528_COVID_ICS-B1 28-MAY-2020 15545 15545 VEMP A06 P2 15545_A6_A06_006.fcs 112409.fcs_398100 NA Hospitalized 63 N/A N/A N/A 47 7 1 1 1 0 0 0 0 0 0 1 1 0 0 30.116 75383.64 64944 22514.73 22004 2342.5312 487.2120 1666.8464 761.1755 2921.429 673.2722 968.7139 16.10978 715.2782 967.5475 2345.441 1763.9929 1403.547 511.6031 2758.407 471.0782
112409.fcs 20200528_COVID_ICS-B1 28-MAY-2020 15545 15545 VEMP A06 P2 15545_A6_A06_006.fcs 112409.fcs_398100 NA Hospitalized 63 N/A N/A N/A 47 7 1 1 1 0 0 0 0 0 0 1 1 0 0 58.746 94916.40 79221 20342.34 19813 1773.9001 878.4894 1494.9532 734.5793 3022.746 686.4818 668.7349 1090.44946 714.0598 1085.7114 2553.222 1121.8394 1306.659 909.5369 2869.590 903.2674
112409.fcs 20200528_COVID_ICS-B1 28-MAY-2020 15545 15545 VEMP A06 P2 15545_A6_A06_006.fcs 112409.fcs_398100 NA Hospitalized 63 N/A N/A N/A 47 7 1 1 1 0 0 0 0 0 0 1 1 0 0 18.835 82572.48 67079 19344.45 17923 1970.5321 682.2386 1464.9182 777.9257 2941.473 932.1238 1432.1625 1199.49841 593.9264 100.4818 2349.360 1561.6705 1410.481 619.6442 2923.428 121.6938
112409.fcs 20200528_COVID_ICS-B1 28-MAY-2020 15545 15545 VEMP A06 P2 15545_A6_A06_006.fcs 112409.fcs_398100 NA Hospitalized 63 N/A N/A N/A 47 7 1 1 0 0 0 0 1 0 0 1 1 0 0 29.098 112103.04 97223 17342.58 17235 1474.5793 479.9280 960.8807 726.4930 2948.384 967.5815 480.7235 2432.53735 495.7607 1064.9620 3127.380 721.6462 1170.589 877.3016 2227.620 516.9161
112409.fcs 20200528_COVID_ICS-B1 28-MAY-2020 15545 15545 VEMP A06 P2 15545_A6_A06_006.fcs 112409.fcs_398100 NA Hospitalized 63 N/A N/A N/A 47 7 1 1 1 0 0 0 0 0 0 1 1 0 0 78.961 108280.24 91731 26836.89 26899 556.5309 384.6051 1736.7019 557.8781 2830.106 455.6583 1041.6759 812.09521 593.9484 924.6109 2016.543 1814.5647 1608.646 668.9354 2566.447 1279.6642
dim(notcd4_cytokinepos_data4tsne)
## [1] 16656    52

Is there maybe one cytokine that is dominating the entire sample?
If one cytokine has very high background expression (and a generous gate), it could be gated positive in a lot of events.
The high number of events expressing this cytokine could lead to it dominating the data, so that most sampled events are positive for this noisy cytokine. It would drown out real signal from other cytokines.

cytokine_dominance <- notcd4_cytokinepos_data4tsne %>% 
  group_by(Batch) %>% 
  summarise_at(notcd4_cytokine_gates, sum) %>%
  t() %>%
  as.data.frame() %>%
  set_names(c("B1", "B2", "B3")) %>%
  rownames_to_column("Cytokine_Gate") %>%
  dplyr::filter(Cytokine_Gate != "Batch")
knitr::kable(cytokine_dominance)
Cytokine_Gate B1 B2 B3
/Time/LD-3+/1419-3+/S/Lymph/NOT4+/107a 1425 2173 1719
/Time/LD-3+/1419-3+/S/Lymph/NOT4+/154 240 414 153
/Time/LD-3+/1419-3+/S/Lymph/NOT4+/IFNG 821 877 575
/Time/LD-3+/1419-3+/S/Lymph/NOT4+/IL2 539 710 529
/Time/LD-3+/1419-3+/S/Lymph/NOT4+/IL17 2546 793 1940
/Time/LD-3+/1419-3+/S/Lymph/NOT4+/IL4513 734 1450 939
/Time/LD-3+/1419-3+/S/Lymph/NOT4+/TNF 459 732 512
cytokine_dominance %>%
  pivot_longer(cols = starts_with("B"), names_to = "Batch", values_to = "Events_in_Gate") %>% 
  mutate(Cytokine = sub(".*4\\+\\/(.*)", "\\1", Cytokine_Gate)) %>% 
  ggplot(aes(Cytokine, Events_in_Gate, fill = Cytokine)) +
  theme_bw(base_size=18) +
  geom_bar(stat="identity") +
  facet_grid(. ~ Batch) +
  labs(title = "Not-CD4 t-SNE Run Cytokine Dominance by Batch")

IL17 seems to be dominating Batch 1. We know that there is high IL17 DMSO background, so we’re likely including cells which were not stimulated by an experimental peptide.

Run t-SNE

cols_4_tsne <- c("CD3 ECD", "CD8b BB700", "CD4 APC-H7",
                  "TNFa FITC", "CD107a PE-Cy7", 
                  "CD154 PE-Cy5", "IL2 PE", "IL17a Ax700", 
                  "IL4/5/13 APC", "IFNg V450", 
                  "CCR7 BV711", "CD45RA BUV737",
                  "CD38 BV605", "HLADR BUV395")
notcd4.scaled_tsne_input <- notcd4_cytokinepos_data4tsne %>% 
  dplyr::select(all_of(cols_4_tsne)) %>% 
  as.matrix() %>% 
  scale() %>% 
  as.data.frame() %>% 
  rename_at(vars(all_of(cols_4_tsne)),function(x) paste0(x,".scaled"))

notcd4_cytokinepos_data4tsne <- cbind(notcd4_cytokinepos_data4tsne, notcd4.scaled_tsne_input)

# t-SNE can take a long time, so there is a rerun_tsne switch
if(rerun_tsne) {
  print("Running t-SNE")
  print(Sys.time())
  notcd4_cytokinepos_tsne_out <- notcd4_cytokinepos_data4tsne %>% 
    # Run CD3, co-receptor, cytokine, memory, and activation markers through t-SNE
    dplyr::select(all_of(paste0(cols_4_tsne, ".scaled"))) %>% 
    Rtsne::Rtsne(check_duplicates = FALSE)
  print(Sys.time())
  notcd4_cytokinepos_dat <- cbind(as.data.frame(notcd4_cytokinepos_tsne_out$Y) %>% 
    dplyr::rename(x = V1, y = V2),
    notcd4_cytokinepos_data4tsne)
  if(save_output) {
    saveRDS(notcd4_cytokinepos_dat, here::here(sprintf("out/tSNE/%s_ICS_NotCD4_CytokinePos_tSNE.rds", date)))
  }
} else {
  # Assuming t-SNE results are already saved
  print("Loading saved t-SNE run")
  notcd4_cytokinepos_dat <- readRDS(here::here(sprintf("out/tSNE/%s_ICS_NotCD4_CytokinePos_tSNE.rds", date)))
}
## [1] "Loading saved t-SNE run"

Plot t-SNE results

# Arial font setup. Downloaded afms from https://github.com/microsoft/microsoft-r-open/tree/ec3fd89e5fb5794bd8149905c134ad801bb61800
Arial <- Type1Font(family = "Arial",
                   metrics = c(here::here("data/Arial_afm/ArialMT.afm"),
                               here::here("data/Arial_afm/ArialMT-Bold.afm"), 
                               here::here("data/Arial_afm/ArialMT-Italic.afm"),
                               here::here("data/Arial_afm/ArialMT-BoldItalic.afm")))
pdfFonts(Arial = Arial)

boolColorScheme <- c("FALSE" = "#E2E2E2", "TRUE" = "#023FA5")

notcd4_cytokinepos_dat <- notcd4_cytokinepos_dat %>% 
  mutate(cytokine_degree = rowSums(dplyr::select(., all_of(notcd4_cytokine_gates))))
notcd4_cytokinepos_dat <- notcd4_cytokinepos_dat %>% 
  mutate(Cohort = factor(Cohort, levels = c("Non-hospitalized", "Hospitalized")),
         STIM = factor(STIM, levels = c("Spike 1", "Spike 2", "NCAP", "VEMP")))

stim_labs <- c("S1", "S2", "NCAP", "VEMP") 
names(stim_labs) <- c("Spike 1", "Spike 2", "NCAP", "VEMP")
cohort_labs <- c("Conv\nNon-Hosp", "Conv\nHosp")
names(cohort_labs) <- c("Non-hospitalized", "Hospitalized")

base_tsne_plot <- function(currentColumn, pointSize = 0.2, colorScheme = NA) {
  p <- ggplot(notcd4_cytokinepos_dat, aes(x=x, y=y,
                                       colour=if(currentColumn %in% c("Batch", "SAMPLE ID")) {
                                         factor(!!as.name(currentColumn))
                                         } else {
                                           as.logical(!!as.name(currentColumn))
                                           })) +
    geom_point(shape=20, alpha=0.8, size=pointSize) +
    facet_grid(Cohort ~ STIM, switch="y",
               labeller = labeller(Cohort = cohort_labs, STIM = stim_labs)) +
    theme_bw() +
    theme(plot.title=element_text(hjust = 0.5, size=22, face="bold"),
          axis.text=element_blank(),
          axis.line=element_blank(),
          axis.ticks=element_blank(),
          axis.title=element_blank(),
          strip.text=element_text(face="bold", size=22),
          panel.grid.major = element_blank(),
          legend.title=element_text(face="bold", size=14),
          strip.text.x = element_text(margin = margin(0.15,0,0.15,0, "cm")))
  if(!anyNA(colorScheme)) {
    p <- p + scale_color_manual(values = colorScheme)
  }
  if(currentColumn %in% c("Batch", "SAMPLE ID")) {
    p <- p + labs(color = currentColumn) +
      guides(colour = guide_legend(override.aes = list(size=10))) +
      theme(legend.title = element_text(size=15),
            legend.text = element_text(size=15),
            legend.position = "bottom") +
      scale_colour_manual(values=as.character(iwanthue(length(unique(notcd4_cytokinepos_dat[,currentColumn])))))
  } else {
    p <- p + theme(legend.position = "none")
  }
  p
}

Look for Batch and Patient sub-clustering

base_tsne_plot("Batch")

base_tsne_plot("SAMPLE ID")

Cytokine, Memory, and Activation expression localization

Now visualize the cytokine, memory, and activation expression localization

for(cg in notcd4_gates_for_tsne) {
  print(base_tsne_plot(cg, colorScheme = boolColorScheme) +
    labs(title = sprintf("NotCD4+Cytokine+ t-SNE\nColored by %s", sub(".*\\/([^(\\/)]+)", "\\1", cg))))
}

Color by degree

table(notcd4_cytokinepos_dat$cytokine_degree)
## 
##     1     2     3     4     5     6 
## 14285  1423   674   245    27     2
myPalette <- colorRampPalette(rev(brewer.pal(11, "Spectral")))
sc <- scale_colour_gradientn(colours = myPalette(100), limits=c(1, 6))

currentColumn <- "cytokine_degree"
pointSize <- 0.2
ggplot(notcd4_cytokinepos_dat, aes(x=x, y=y, colour=!!as.name(currentColumn))) +
  geom_point(shape=20, alpha=0.8, size=pointSize) +
  facet_grid(Cohort ~ STIM, switch="y",
             labeller = labeller(Cohort = cohort_labs, STIM = stim_labs)) +
  labs(colour="Cytokine\nDegree",
       title="NotCD4+Cytokine+ t-SNE\nColored by Cytokine Polyfunctionality") +
  theme_bw() +
  theme(plot.title=element_text(hjust = 0.5, size=22, face="bold"),
        axis.text=element_blank(),
        axis.line=element_blank(),
        axis.ticks=element_blank(),
        axis.title=element_blank(),
        strip.text=element_text(face="bold", size=22),
        panel.grid.major = element_blank(),
        legend.title=element_text(face="bold", size=15),
        legend.text = element_text(size=13),
        strip.text.x = element_text(margin = margin(0.15,0,0.15,0, "cm")),
        legend.position = "bottom") +
  sc

Color t-SNE plots by CD4 and CD8 scaled mfi

Since the Not-CD4 gate contains both CD4-high, CD4-low, CD8-high, and CD8-low events, show which events fall in which co-receptor subset.

notcd4_cytokinepos_dat %>%
  ggplot(aes(x=`CD4 APC-H7.scaled`, color=factor(Batch), fill=factor(Batch))) +
  geom_histogram(alpha=0.2, position="identity", bins=60) +
  geom_vline(xintercept = -3) +
  geom_vline(xintercept = 3) +
  coord_cartesian(xlim = c(-5, 5))

notcd4_cytokinepos_dat %>%
  ggplot(aes(x=`CD8b BB700.scaled`, color=factor(Batch), fill=factor(Batch))) +
  geom_histogram(alpha=0.2, position="identity", bins=720) +
  geom_vline(xintercept = -1.1) +
  geom_vline(xintercept = 1.1) +
  coord_cartesian(xlim = c(-2, 2))

myPalette <- colorRampPalette(rev(brewer.pal(11, "Spectral")))

map2(.x = c("CD4 APC-H7.scaled", "CD8b BB700.scaled"),
     .y = list(c(-3, 3), c(-1.1, 1.1)),
     .f = function(currentColumn, currentLimits) {
        pointSize <- 0.4
        ggplot(notcd4_cytokinepos_dat, aes(x=x, y=y, colour=!!as.name(currentColumn))) +
          geom_point(shape=20, alpha=0.8, size=pointSize) +
          facet_grid(Cohort ~ STIM, switch="y",
                     labeller = labeller(Cohort = cohort_labs, STIM = stim_labs)) +
          labs(colour=currentColumn,
               title=sprintf("NotCD4+Cytokine+ t-SNE\nColored by %s mfi", currentColumn)) +
          theme_bw() +
          theme(plot.title=element_text(hjust = 0.5, size=22, face="bold"),
                axis.text=element_blank(),
                axis.line=element_blank(),
                axis.ticks=element_blank(),
                axis.title=element_blank(),
                strip.text=element_text(face="bold", size=22),
                panel.grid.major = element_blank(),
                legend.title=element_text(face="bold", size=15),
                legend.text = element_text(size=11),
                strip.text.x = element_text(margin = margin(0.15,0,0.15,0, "cm")),
                legend.position = "bottom") +
          # values lower or higher than the limits get converted to min(limits) and max(limits), respectively
          scale_colour_gradientn(colours = myPalette(100), oob=squish, limits=currentLimits)
     })
## [[1]]

## 
## [[2]]

Not-CD4 Observations:

  • Differences across Stims:
    • CD107a more hghly present in VEMP (confirmed by COMPASS)
    • CD107a+IFNg+TNF+ less expressed in VEMP (confirmed by COMPASS)
    • IFNg and TNF highest in NCAP (confirmed by COMPASS)
    • IL2 less expressed in VEMP
    • IL17 and IL4/5/13 seen all stims (but unfortunately much of this is likely noise, as previously discussed)
  • By hospitalization status:
    • No big differences. The COMPASS follow-up dotplots also did not show any significant difference in subpopulation background-corrected magnitudes by status.
  • Memory
    • Unlike in CD4s where CCR7 was expressed by a majority events, CCR7 is probably expressed in ~50% of Not-CD4 events.
    • CD45RA looks like it’s expressed by ~40-50% of events, a little higher than with CD4s, but less localized.
  • Activation
    • Smattering of CD38. Small CD38+ population in NCAP which also expresses TNF and IFNG?
    • Only a tiny bit of HLA-DR (looks like it might be CD107a or IL2 positive)